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Abstract: We derive a generalization of the Second Law of Thermodynamics that 
uses Bayesian updates to explicitly incorporate the effects of a measurement of a sys¬ 
tem at some point in its evolution. By allowing an experimenter’s knowledge to be 
updated by the measurement process, this formulation resolves a tension between the 
fact that the entropy of a statistical system can sometimes fluctuate downward and the 
information-theoretic idea that knowledge of a stochastically-evolving system degrades 
over time. The Bayesian Second Law can be written as AH{pm,p) + {Q)F\m > 0, 
where AH{pm,p) is the change in the cross entropy between the original phase-space 
probability distribution p and the measurement-updated distribution pm, and (Q)_F|m 
is the expectation value of a generalized heat flow out of the system. We also derive 
rehned versions of the Second Law that bound the entropy increase from below by 
a non-negative number, as well as Bayesian versions of the Jarzynski equality. We 
demonstrate the formalism using simple analytical and numerical examples. 
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1 Introduction 


The Second Law of Thermodynamics encapsulates one of the most important facts 
about the macroscopic world: entropy increases over time. There are, however, a 
number of different ways to dehne “entropy,” and corresponding controversies over 
how to best understand the Second Law. In this paper we offer a formulation of the 
Second Law that helps to resolve some of the tension between different approaches, 
by explicitly including the effects of the measurement process on our knowledge of the 
state of the system. This Bayesian Second Law (BSL) provides a new tool for analyzing 
the evolution of statistical systems, especially for small numbers of particles and short 
times, where downward fluctuations in entropy can be important. 

One way to think about entropy and the Second Law, due to Boltzmann, coarse- 
grains the phase space T of a system into macrostates. The entropy of a microstate x is 
then given by S' = logOj., where is the volume of the macrostate to which x belongs. 
(Throughout this paper we set Boltzmann’s constant ks equal to unity.) The coarse- 
graining itself is subjective, but once it is hxed there is a dehnite entropy objectively 
associated with each microstate. Assuming that the system starts in a low-entropy 
state (the “Past Hypothesis”), the Second Law simply reflects the fact that undirected 
evolution is likely to take the state into ever-larger macrostates: there are more ways 
to be high-entropy than to be low-entropy. The Second Law is statistical, in the sense 
that random fluctuations into lower-entropy states, while rare, are certainly possible. 
In many contexts of interest to modern science, from nanoscale physics to biology, 
these fluctuations are of crucial importance, and the study of “fluctuation theorems” 
has garnered considerable attention in recent years [1-7]. 

Another perspective on entropy, associated with Gibbs in statistical mechanics and 
Shannon [8] in the context of information theory, starts with a normalized probability 
distribution p{x) on phase space, and dehnes the entropy as S = — J dx p{x) log p{x). 
In contrast with the Boltzmann formulation, in this version the entropy characterizes 
the state of our knowledge of the system, rather than representing an objective fact 
about the system itself. The more spread-out and uncertain a distribution is, the 
higher its entropy. The Second Law, in this view, represents the influence of stochastic 
dynamics on the evolution of the system, for example due to interactions with a heat 
bath, under the influence of which we know less and less about the microstate of the 
system as time passes. 

For many purposes, the Gibbs/Shannon formulation of entropy and the Second 
Law is more convenient to use than the Boltzmann formulation. However, it raises 
a puzzle: how can entropy ever fluctuate downward? In an isolated system evolving 
according to Hamiltonian dynamics, the Gibbs entropy is strictly constant, rather than 
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increasing; for a system coupled to a heat bath with no net energy transfer, it tends 
to monotonically increase, asymptoting to a maximum equilibrium value. Ultimately 
this is because the Gibbs entropy characterizes our knowledge of the microstate of the 
system, which only diminishes with time.^ 

We can, of course, actually observe the system; if we do so, we will (extremely) 
occasionally notice that it has fluctuated into what we would characterize as a low- 
entropy state from Boltzmann’s perspective. The air in a room could fluctuate into 
one corner, for example, or a cool glass of water could evolve into a warm glass of water 
containing an ice cube. To reconcile this real physical possibility with an information¬ 
centric understanding of entropy, we need to explicitly account for the impact of the act 
of measurement on our knowledge of the system. This is the task of Bayesian analysis, 
which shows us how to update probability distributions in the face of new information 
[10, 11]. Since the advent of Maxwell’s demon, measurement in the context of statistical 
mechanics has been explored extensively [12]. This has resulted in a body of literature 
linking information-theoretic quantities to thermodynamic variables [13, 14]. However, 
such analyses only examine the impact of measurement at the point in time when it 
is performed. In the present work, we observe that such measurements also contain 
information about the state of the system at earlier points in time that are hitherto 
unaccounted for. This results in novel modifications of the Second Law. 

The setup we consider consists of a classical system coupled to an environment. The 
dynamics of the system are stochastic, governed by transition probabilities, either due 
to intrinsic randomness in the behavior of the system or to the unpredictable influence 
of the environment. An experimental protocol is determined by a set of time-dependent 
parameters, which may be thought of as macroscopic features (such as the location of 
a piston) controlled by the experimenter. The experimenter’s initial knowledge of the 
system is characterized by some probability distribution; as the system is evolved under 
the protocol for some period of time, this probability distribution also evolves. At the 
end of the experiment, the experimenter performs a measurement. Bayes’s Theorem 
tells us how to update our estimates about the system based on the outcome of the 
measurement; in particular, we can use the measurement outcome to update the hnal 
probability distribution, but also to update the initial distribution. The BSL is a 
relation between the original (non-updated) distributions, the updated distributions, 
and a generalized heat transfer between the system and the environment. 

^Boltzmann himself also studied a similar formulation of entropy, which he used to prove his H- 
theorem. The difference is that the i?-functional represents N particles in one 6-dimensional single¬ 
particle phase space, rather than in a 6iV-dimensional multi-particle phase space. This is not a full 
representation of the system, as it throws away information about correlations between particles. The 
corresponding dynamics are not reversible, and entropy increases [9]. 
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The Second Law contains information about irreversibility; a crucial role in our 
analysis is played by the relationship between transition probabilities forward in time 
and “reversed” probabilities backward in time. Consider a situation in which the system 
in question is an egg, and the experiment consists of holding the egg up and dropping 
it. To be precise, the experimental protocol, which we will call the “forward” protocol, 
is for the experimenter to hold the egg in the palm of her open hand, and then to turn 
her hand over after a specihed amount of time. The initial probability distribution 
for the particles that make up the egg is one that corresponds to an intact egg in the 
experimenter’s hand. With overwhelming probability the forward protocol applied to 
this initial state will result in an egg on the floor, broken. 

This experiment is clearly of the irreversible type, but we should be careful about 
why and how it is irreversible. If we reversed the velocities of every particle in the 
universe, then time would run backward and the egg would reconstitute itself and fly 
back up into the experimenter’s hand. This sort of fundamental reversibility is not 
what concerns us. For us, irreversibility means that there are dissipative losses to the 
environment; in particular, there are losses of information as the state of the system 
interacts with that of the environment. This information loss is what characterizes 
irreversibility. From the theoretical viewpoint, we should ask what would happen if all 
of the velocities of the broken egg particles were instantaneously reversed, leaving the 
environment alone. Again with overwhelming probability, the egg would remain broken 
on the floor. To make sure the time-dependent actions of the experimenter do not 
affect this conclusion, we should also instruct the experimenter to run her experiment 
in reverse: she should begin with her palm facing downward while the egg is broken on 
the floor, and then turn it upward after a certain amount of time. In this example, the 
effect of reversing the experimental procedure is negligible; the probability that the egg 
will reassemble itself and hop up into her hand is not zero, but it is extremely small. 

The generalization beyond the egg dropping experiment is clear. We have a sys¬ 
tem and an environment, and an experimenter who executes a forward protocol, which 
means a macroscopic time-dependent influence on the dynamics of the system. The 
environmental interactions with the system are deterministic but unknown to the ex¬ 
perimenter, and so the system evolves stochastically from her point of view. She assigns 
probabilities to trajectories the system might take through phase space. We will call 
these the “forward” probabilities. To isolate potential irreversibility in the system, we 
consider reversing all of the velocities of the system’s particles in its hnal state, and then 
executing the “reverse” protocol, which is just the forward protocol backward. The en¬ 
vironment still interacts in an unknown way, so the system again evolves stochastically. 
The probabilities that the experimenter assigns to trajectories in this reversed setup 
are called the reverse probabilities. 
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To get precise versions of the Second Law, we will consider a particular information- 
theoretic measure of the difference between the forward and reverse probabilities, known 
as the relative entropy or Kullback-Leibler divergence [15]. The relative entropy of two 
probability distributions is always non-negative, and vanishes if and only if the two 
distributions are identical. The relative entropy of the forward and reverse probability 
distributions on phase space trajectories is a measure of the irreversibility of the system, 
and the non-negativity of that relative entropy is a precise version of the Second Law. 

The inclusion of Bayesian updates as the result of an observation at the end of the 
protocol leads to the Bayesian Second Law. The BSL can be written in several ways, 
one of which is: 

p) + {Q) F\m^ (IT) 

Here, p is the probability distribution without updating, and pm is the updated distri¬ 
bution after obtaining measurement outcome m. H = — J pm log p is the cross entropy 
between the two distributions. The cross entropy is the sum of the entropy of pm and 
the relative entropy of pm with respect to p; it can be thought of as the average amount 
we would learn about the system by being told its precise microstate, if we thought 
it was in one distribution (the original p) but it was actually in another (the updated 
pm)- Like the ordinary entropy, this is a measure of uncertainty: the more information 
contained in the (unknown) microstate, the greater the uncertainty. However, the cross 
entropy corrects for our false impression of the distribution. The difference in the cross 
entropy between the initial and hnal times is AiL, and {Q) F\m is the expectation value 
of a generalized heat transfer between the system and the environment, which con¬ 
tains information about the irreversibility of the system’s dynamics. Thus, at zero heat 
transfer, the BSL expresses the fact that our uncertainty about the system is larger at 
the time of measurement, even after accounting for the measurement outcome. 

The relative entropy is not only non-negative, it is monotonic: if we apply a stochas¬ 
tic (probability-conserving) operator to any two distributions, the relative entropy be¬ 
tween them stays constant or decreases. We can use this fact to prove rehned versions 
of both the ordinary and Bayesian Second Laws, obtaining a tighter bound than zero to 
the expected entropy change plus heat transfer. This new lower bound is the relative en¬ 
tropy between the initial probability distribution and one that has been cycled through 
forward and reverse evolution, and therefore characterizes the amount of irreversibility 
in the evolution. 

We also apply our implementation of Bayesian updating to the Jarzynski equality, 
which relates the expected value of the work performed during a process to the change 
in free energy between the initial and hnal states. Lastly, we illustrate the BSL in 
the context of some simple models. These include deriving Boltzmann’s version of the 
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Second Law within our formalism, and studying the numerical evolution of a randomly 
driven harmonic oscillator. 

2 Setup 

2.1 The System and Evolution Probabilities 

We are primarily concerned with dynamical systems that undergo non-deterministic 
evolution, typically due to interactions with an environment about which the experi¬ 
menter has no detailed knowledge. The effect of the unknown environment is to induce 
effectively stochastic evolution on the system; as such, we can only describe the state 
and subsequent time evolution of the system probabilistically [16]. We are considering 
classical mechanics, where probabilities only arise due to the ignorance of the exper¬ 
imenter, including ignorance of the state of the environment. Analogous equations 
would apply more generally to truly stochastic systems, or to stochastic models of 
dynamical systems. 

The state of the system at time t is therefore a random variable Xf taking values 
in a space of states L. We will refer to L as “phase space,” as if it were a conventional 
Hamiltonian system, although the equations apply equally well to model systems with 
discrete state spaces. Because the evolution is non-deterministic, we can only give a 
probability that the system is in state x at time f, which we write as P{Xt = x). This 
is a true probability in the discrete case; in the continuous case it is more properly 
a probability density that should be integrated over a hnite region of L to obtain a 
probability, but we generally will not draw this distinction explicitly. For notational 
convenience, we will often write this probability as a distribution function, 

Pt{x)=P{Xt = x), (2.1) 

which is normalized so that j pt{x) dx = 1. 

The experimenter has two roles: to manipulate a set of external control parame¬ 
ters dehning the experimental protocol, and to perform measurements on the system. 
All measurements are assumed to be “ideal”; that is, the act of measuring any given 
property of the system is assumed to induce no backreaction on its state, and we do 
not track the statistical properties of the measuring device. 

We will primarily be studying experiments that take place over a hxed time interval 
r. The experimental protocol is fully specihed by the history of a set of external control 
parameters that can change over this time interval, Xi{t). The control parameters A* 
specify the behavior of various external potentials acting on the system, such as the 
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volume of a container or the frequency of optical tweezers. We will refer to the set 
X{t) = {Aj(t)} of control parameters as functions of time as the “forward protocol.” 

The forward protocol and the dynamics of the system together determine the for¬ 
ward transition fnnction, np, which tells us the probability that the system evolves 
from an initial state a; at t = 0 to a hnal state x' at t = r; 

TTpix x') = P{Xr = x'\Xo = X-,X (t)). (2.2) 

The transition fnnction is a conditional probability, normalized so that the system 
ends np somewhere with probability one: 

J tif{x —>■ x)dx' = 1. (2.3) 

The forward transition fnnction evolves the initial distribntion to the hnal distribntion, 

p.r{x') = J dx Po{x)'Kf{x ^ x'). (2.4) 

A central role will be played by the joint probability that the system begins at x 
and ends up a time r later at 

Pf{x,x') = P{Xq = X, Xt- = x') = Pq{x)'Kf{x —>• x'), (2.5) 

which is normalized so that / P{x,x') dxdx' = 1. By snmming the joint probability 
over X or x' we obtain the distribntion fnnctions Pt{x') or Pq{x), respectively: 


Pr{x') = ^ 

Pf{x, x')dx, 


Po{x) = ^ 

Ppix, x')dx'. 

(2.6) 


We close this subsection with a brief digression on the probabilities of phase-space 
trajectories. The rules of conditional probability allow us to break up the transition 
functions based on subdivisions of the time interval [0,r]. For the special case of a 
Markov process, we have the identity 

TTpix -)■ x') = j [dx] P{Xr = x'\Xt^ = Xn) 

X P{Xt^ = XN\Xt^_^ = xn-i) ■ = xi\Xo = x), (2.7) 

^Here and below we will mostly omit the dependence on the control parameters X{t) from the 
notation for brevity. They will return in Section 2.3 when we discuss time-reversed experiments. 
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where [dx] is the product of all the dx^ and we choose tk = kr/i^N + 1). This is familiar 
as a discretization of the path integral, and in the continuum limit we would write 

Px{t)=x' 

TTf{x^x')= / 'Dx{t)T[F[x{t)]. ( 2 . 8 ) 

The functional TiF[x{t)] is a probability density on the space of trajectories with fixed 
initial position, bnt with the hnal position free. To get a probability density on the space 
of trajectories with two free endpoints, we just have to multiply TiF[x{t)] by the initial 
distribntion Pq{x). The result, which we call Pp[a;(t)], is the path-space version of the 
joint distribntion Pf{x,x'). We will not make heavy use of these path-space quantities 
below, bnt the formal manipulations we make with the ordinary transition fnnction 
and joint distribntion can be repeated exactly with the path-space distribntions, and 
occasionally we will comment on the path-space versions of onr resnlts. 

2.2 Measurement and Bayesian Updating 

The probability density on phase space can also change throngh Bayesian npdates when 
a measnrement is made: the experimenter modihes her probabilities to acconnt for the 
new information. We will restrict ourselves to measurements performed at time r, 
the end of the experiment, thongh it is simple to extend the results to more general 
measnrement protocols. The measnrement ontcome is a random variable M that only 
depends on the state of the system at time r, not on the prior history of the system. 
The measnrement is then characterized by the fnnction 

P{m\x') = P{M = m\Xr = x) ( 2 . 9 ) 

= probability of measnrement ontcome m given state x at time r. 

The updated phase space distribution at time r is obtained by Bayes’s rnle, which in 
this case takes the form 

pT\m{x') = P{X^ = x'\M = m) = ( 2 . 10 ) 

Here the denominator is P{m) = J P{m\y')pr{y')dy', and serves as a normalization 
factor. 

If we know the transition fnnction, we can also npdate the phase space distribntion 
at any other time based on the measurement ontcome at time r. Below we will make 
use of the updated initial distribntion: 

Po{x) J dx' tif^x —)■ x')P{m\x') 

P{m) 
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Po\m{x) = P{Xo = x\M = m) 


( 2 . 11 ) 
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(3.10) 
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( 2 . 11 ) 
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(2.14) 


Prl'mi.X ) 

-> 
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Pm (^) 

—► 


Forward 

experiment 
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Figure 1: Relationships between the various distribution functions we dehne: the orig¬ 
inal distribution Pq{x), its time-evolved version pr{x'), their corresponding Bayesian- 
updated versions po\m{x) and pT\m{x'), and the cycled distributions p{x) and Pm{.x) 
discussed in Sections 3.2 and 4.3. Equation numbers refer to where the distributions 
are related to each other. 


This reflects our best information about the initial state of the system given the outcome 
of the experiment; Po\m{x) is the probability, given the original distribution Pq{x) and 
the measurement outcome m at time t = r, that the system was in state x at time 
f = 0. For example, we may initially be ignorant about the value of an exactly conserved 
quantity. If we measure it at the end of the experiment then we know that it had to 
have the same value at the start; this could mean a big difference between po and po\m, 
though often the effects will be more subtle. The various distribution functions we 
work with are summarized in Figure 1 and listed in Table 1. 

Finally, we can update the forward transition functions. 


T^Flmix -)■ X') 


P{Xr = x'\Xo = x,M = m) 


71f{x —)■ x')P{m\x') 

f dy’ iTfix -> ■!/)P{m[!/) ’ 


( 2 . 12 ) 


and the joint distributions. 


Ppimix, x) = P{Xq = X, Xj. = x'\M = m) = 


P{m\x') „ , M . n 

———-Pf[x, X ) = Polmi^hFImi^ x), 

P[m) 

( 2 . 13 ) 

based on the measurement outcome. As we would expect, the updated transition 
function evolves the updated distribution from the initial to the hnal time: 


pT\m{x')= / dx pQ\m{x)FF\m{x ^ X'). 


(2.14) 
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Distribution 

Name 

Dehnition 

Po{x) 

Initial Distribution 

2.1 

71f{x —>■ x') 

Forward Transition Function 

2.2 

Pt{x') 

Final Distribution 

2.4 

Pf{x, x') 

Joint Forward Distribution 

2.5 

P{m\x) 

Measurement Function 

2.9 

Pr|m(^ ) 

Updated Final Distribution 

2.10 

P0|m(2^) 

Updated Initial Distribution 

2.11 

T^F\m{x -)■ X') 

Updated Forward Transition Function 

2.12 

PF\m{x,x') 

Updated Joint Forward Distribution 

2.13 

'Kr{x' -)■ X) 

Reverse Transition Function 

2.17 

Pr{x,x') 

Joint Reverse Distribution 

2.18 

PR\m{x,x') 

Updated Joint Reverse Distribution 

2.18 

p{x) 

Cycled Distribution 

3.10 

Pm{x) 

Updated Cycled Distribution 

4.16 


Table 1: List of named probability distributions and their defining equations. These 
are grouped according to whether they are updated and/or time-reversed. 

It may seem odd to update the transition functions based on measurements, since in 
principle the original transition functions were completely determined by the stochastic 
dynamics of the system and this is a desirable property that one would like to preserve. 
For this reason, the unupdated transition functions will play a special role below, while 
the updated ones are only used as an intermediate quantity in algebraic manipulations. 

To illustrate these dehnitions, consider a simple toy model; a collection of N 
independent classical spins, each of which has a hxed probability to flip its state at 
each timestep. In this model it is most intuitive to work with a distribution function 
dehned on macrostates (total number of up spins) rather than on microstates (ordered 
sequences of up/down spins). 

The distribution functions relevant to our analysis are illustrated for this toy model 
with N = 100 spins in Fig. 2. To make the effects of evolution and updating most clear, 
we start with a bimodal initial distribution po(x), uniform on the intervals 0 < a; < 10 
and 90 < X < 100. The system is evolved for a short time r, not long enough to attain 
the equilibrium distribution, which would be a binomial centered at x = A^/2 = 50. The 
hnal distribution Pt{x') therefore has two small peaks just above and below x' = 50. 
We then perform a measurement, which simply asks whether most of the spins are up 
or down, obtaining the answer “mostly down.” This corresponds to a measurement 
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Figure 2: The various distribution functions illustrated within a toy model of 100 
independent spins with a hxed chance of flipping at every timestep. The distributions 
are normalized functions on the space of the total number x of up-spins. We consider 
an initial distribution (thick solid blue line) that is equally split between the intervals 
X < 10 and 90 < x. The system is evolved for enough time to come close to equilibrium 
but not quite reach it, as shown by the hnal distribution (thin solid red line). A 
measurement is performed, revealing that less than half of the spins are up (dot-dashed 
purple line). We can therefore update the post-measurement hnal distribution (dashed 
red line). The corresponding updated initial distribution (dotted blue line) is similar 
to the original initial distribution, but with a boost at low x and a decrease at high x. 


function 


P{m\x) 


1 if X < 50, 
0 if X > 50. 


(2.15) 


In Fig. 2 we have plotted the normalized version P(m|x)/P(m). From this we can 
construct the updated hnal and updated initial distributions, using (2.10) and (2.11). 
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The updated final distribution is just the left half of the non-updated final distribution, 
suitably renormalized. The updated initial distribution is a re-weighted version of the 
non-updated initial distribution, indicating that there is a greater probability for the 
system to have started with very few up spins (which makes sense, since our final 
measurement found that the spins were mostly down). This toy model does not have 
especially intricate dynamics, but it suffices to show how our evolution-and-updating 
procedure works. 

2.3 The Reverse Protocol and Time Reversal 

The Second Law contains information about the irreversibility of the time-evolution of 
the system, so to derive it we need to specify procedures to time-reverse both states 
and dynamics. Specifically, we will define an effectively “time-reversed” experiment 
that we can perform whose results can be compared to the time-forward experiment. 
As discussed in the Introduction, the point here is not to literally reverse the flow 
of time upon completion of the time-forward experiment (which would just undo the 
experiment), but to isolate the effects of dissipative processes, like friction, which result 
from complicated interactions with the environment. 

For a state x, we denote by x the time-reversed state. In a ballistic model of par¬ 
ticles, X is just the same as x with all of the particle velocities reversed. We are only 
talking about the velocities of the particles that make up the system, not the environ¬ 
ment. In practice, an experimenter is not able to control the individual velocities of all 
of the particles in the system, so it may seem pointless to talk about reversing them. 
It will often be possible, however, to set up a time-reversed probability distribution 
p(a;) = p{x) given some procedure for setting up p{x). For instance, if the system 
has a Maxwellian distribution of velocities with zero center-of-mass motion, then the 
probability distribution on phase space is actually time-reversal invariant. 

Time reversal of dynamics is simpler, primarily because we have only limited ex¬ 
perimental control over them. The system will have its own internal dynamics, it will 
interact with the environment, and it will be influenced by the experimenter. In a real 
experiment, it is only the influence of the experimenter that we are able to control, 
so our notion of time reversal for the dynamics is phrased purely in terms of the way 
the experimenter decides to influence the system. The experimenter influences the sys¬ 
tem in a (potentially) time-dependent way by following an experimental protocol, A(t), 
which we have called the “forward protocol.” The forward protocol is a sequence of in¬ 
structions to carry out using some given apparatus while the experiment is happening. 
We therefore define a “reverse protocol,” which simply calls for the experimenter to 
execute the instructions backward. In practice, that involves time-reversing the con¬ 
trol parameters {e.g., reversing macroscopic momenta and magnetic fields) and running 
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them backwards in time, sending 


Aj(t) —>■ Aj(r — t). (2.16) 

For simplicity we will generally assume that the control parameters are individually 
invariant under time-reversal, so we won’t distinguish between A and A. The non¬ 
trivial aspect of the reverse protocol is then simply exchanging t with t — t. If the 
control parameters are time-independent for the forward protocol, then there will be 
no difference between the forward and reverse protocols. This kind of experiment 
involves setting up the initial state of the system and then just waiting for a time r 
before making measurements. 

Recall that the transition functions ttf for the system were dehned assuming the 
experimenter was following the forward protocol. The reverse protocol is associated 
with a set of reverse transition functions tt/j. We dehne in analogy with (2.2) as 

'Kr{x' x) = P{Xt = x\Xq = x'; A (r — f)), (2-17) 

normalized as usual so that / t^r{x' x) dx = 1. 

We will also need a time-reversed version of the joint distribution Pr. As before, let 
po{x) denote the initial distribution, and let pT\m{,x) and Pt{x) denote the distributions 
at time r after following the forward protocol with and without Bayesian updates due 
to measurement, respectively. Then, following (2.5) and (2.13), dehne 

Pr{x,x') = Pr{x')7iR{x' -)■ x), 

PR\m{x, x') = pr\m{x')7lR{x' x). (2.18) 

Although the reverse transition functions ttr are written as functions of the time- 
reversed states X and x', it is straightforward to apply the time-reversal map on these 
states to obtain the left-hand side purely as a function of x and x'. 

It is helpful to think of these reverse joint probabilities in terms of a brand new 
experiment that starts fresh and runs for time r. The initial distribution for this 
experiment is given by the hnal distribution coming from the forward experiment (with 
or without updates), and the experiment consists of time-reversing the state, executing 
the reverse protocol, and then time-reversing the state once more. 

Our formalism should be contrasted with the typical formulation of a reverse ex¬ 
periment found in the literature. The initial distribution for the reverse experiment 
is frequently taken to be the equilibrium distribution for the hnal choice of control 
parameters [5]. The present method is more similar to the formalism of Seifert [16] in 
which an arbitrary hnal distribution, pi{xt), is considered. 
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Note that in the dehnition of PRln., unlike in (2.13) above, the conditioning on m 
does not affect the transition function ttr. This is because, from the point of view 
of the reverse experiment, the measurement happens at the beginning. But tir is a 
conditional probability which assumes a particular initial state (in this case x'), and 
so the measurement m does not provide any additional information that can possibly 
affect the transition function. Also note the ordering of the arguments as compared 
with Pp in (2.5): the initial state for the reversed experiment is the second argument 
for Pr, while the initial state for the forward experiment is the hrst argument in Pp. 
Finally, we record the useful identity 

Pp\^{x,x') _Pp{x,x') 

PR\ra[X,x') Pr{x,P) 

assuming both sides are well-dehned for the chosen states x and x'. 


2.4 Heat Flow 


The Crooks Fluctuation Theorem [3] relates forward and reverse transition functions 
between equilibrium states to entropy production. It can be thought of as arising via 
coarse-graining from the “detailed fluctuation theorem,” which relates the probabilities 
of individual forward and backward trajectories to the heat generated along the path 
through phase space [2, 5]. Outside the context of equilibrium thermodynamics, this 
relationship can be thought of as the dehnition of the “heat how”: 


M . 'Xp\x(t)] 

Q[x{t)] = log 


r-/ (2-20) 

nR[x{T - f)] 

The quantity Q[x{t)] can be equated with the thermodynamic heat (howing out of the 
system, in this case) in situations where the latter concept makes sense. (More properly, 
it is the heat how in units of the inverse temperature of the heat bath, since Q[x{t)] is 
dimensionless.) However, Q[x{t)] is a more general quantity than the thermodynamic 
heat; it is well-dehned whenever the transition functions exist, including situations far 
from equilibrium or without any hxed-temperature heat bath. 

In a similar manner, we can use the coarse-grained transition functions (depending 
on endpoints rather than the entire path) to dehne the following useful quantity. 


/N . 7ip(x —)■ x') 
Q{x^x)= log —-=-—. 

71r{x' -)■ X) 


( 2 . 21 ) 


This quantity Q, the “generalized heat how,” is intuitively a coarse-grained version of 
the change in entropy of the environment during the transition x ^ x'm the forward 
experiment, though it is well-dehned whenever the appropriate transition functions 
exist. It is this generalized heat how that will appear in our versions of the Second Law 
and the Bayesian Second Law. 
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3 Second Laws from Relative Entropy 


All of the information about forward and reversed transition probabilities of the system 
is contained in the joint forward probability distribution Pp{x,x') and reverse distri¬ 
bution Pji{x,x'), dehned in (2.5) and (2.18), respectively. The effects of a Bayesian 
update on a measurement outcome m are accounted for in the distributions PF\m{x, x') 
and PR\m{x,x'), given in (2.13) and (2.18). The most concise statements of the Second 
Law therefore arise from comparing these distributions. 

3.1 The Ordinary Second Law from Positivity of Relative Entropy 

The relative entropy, also known as the Kullback-Leibler divergence [15], is a measure 
of the distinguishability of two probability distributions: 



( 3 . 1 ) 


In a rough sense, D{p\\q) can be thought of as the amount of information lost by replac¬ 
ing a true distribution p by an assumed distribution q. Relative entropy is nonnegative 
as a consequence of the concavity of the logarithm, and only vanishes when its two 
arguments are identical. In this sense it is like a distance, but with the key property 
that it is asymmetric in p and g, as both the definition and the intuitive description 
should make clear. 

The relative entropy has been used in previous literature to quantify the infor¬ 
mation loss due to the stochastic evolution of a system. This has been achieved by 
analyzing path-space or phase-space distributions at a hxed time [5, 17, 18]. In a sim¬ 
ilar manner, we compute the relative entropy of the forward probability distribution 
with respect to the reverse one. However, we think of Pp(x,x') and Pr{x,x') each as 
single distributions on the space T x T, so that 



( 3 . 2 ) 


Into this we can plug the expressions (2.5) and (2.18) for Pp and Pp, as well as the 


relations (2.6) between those distributions and the single-time distributions po(^) and 
Pt{x'), to obtain 



( 3 . 3 ) 


( 3 . 4 ) 
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Here S is the usual Gibbs or Shannon entropy, 


S{p) = - / pM ^ogp(x) <1^, 


(3.5) 


and Q is the generalized heat flow deflned by (2.21) above. The first two terms in 
(3.4) constitute the change in entropy of the system, while the third term represents 
an entropy change in the environment averaged over initial and final states. We will 
introduce the notation (■)^ to denote the average of a quantity with respect to the 
probability distribution Pp, 



(3.6) 


The positivity of the relative entropy (3.2) is therefore equivalent to 

A5 + (Q)^>0, 


(3.7) 


with equality if and only if Pp = Pr. This is the simplest form of the Second Law; it 


says that the change in entropy of the system is bounded from below by (minus) the 
average of the generalized heat Q with respect to the forward probability distribution. 

The result (3.7) is an information-theoretical statement; in the general case we 
should not think of S' as a thermodynamic entropy or (Q)p as the expectation value 
of a quantity which can be measured in experiments. To recover the thermodynamic 
Second Law, we must restrict ourselves to setups in which temperature, heat flow, and 
thermodynamic entropy are all well-defined. In this case, we can interpret (Q)p as the 
expected amount of coarse-grained heat flow into the environment. “Coarse-grained” 
here refers to the difference between the endpoint-dependent Q(x —>■ x') and the fully 
path-dependent Q[a;(f)] introduced above. By considering the relative entropy of the 
forward path-space probability Pp[a:(t)] with respect to the reverse one PR[a:(t)], we 
can recover the ordinary Second Law with the ordinary heat term, obtained from (3.7) 
by the replacement Q —)■ Q. We will have more to say about the relationship between 
these two forms of the ordinary Second Law in the following section. 

3.2 A Refined Second Law from Monotonicity of Relative Entropy 

Given any pair of probability distributions p{x,y), q{x,y) on multiple variables, we 
have 



(3.8) 


This property is known as the monotonicity of relative entropy. To build intuition, it 
is useful to first consider a more general property of the relative entropy: 


D{p\\q) > D {Wp\\Wq) VW, 


(3.9) 
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where W is a. probability-conserving (i.e., stochastic) operator. This result follows 
straightforwardly from the dehnition of relative entropy and the convexity of the loga¬ 
rithm. In words, it means that performing any probability-conserving operation W on 
probability distributions p and q can only reduce their relative entropy. 

In information theory, (3.9) is known as the Data Processing Lemma [19-21], since 
it states that processing a signal only decreases its information content. Marginaliz¬ 
ing over a variable is one such way of processing (it is probability-conserving by the 
dehnition of p and q), so marginalization, in particular, cannot increase the relative 
information. Intuitively, (3.8) says that marginalizing over one variable decreases the 
amount of information lost when one approximates p with q. 

Our single-time probability distributions pt{x) can be thought of as marginalized 
versions of the joint distribution Pf{x,x'), following (2.6). We can also dehne a new 
“cycled” distribution by marginalizing Pr{x,x') over x' to obtain 

p[x) = J dx' Pr{x,x') = j dx'Pr{x')TiR{x'^ x). (3.10) 

This is the probability distribution we hnd at the conclusion of the reversed experiment, 
or, in other words, after running through a complete cycle of evolving forward, time- 
reversing the state, evolving with the reverse protocol, and then time-reversing once 
more. In the absence of environmental interaction, we expect the cycled distribution to 
match up with the initial distribution po{x)^ since the evolution of an isolated system 
is completely deterministic. 

Applying monotonicity to Pp and Pr by marginalizing over the hnal state P, we 
have 

D{Pf\\Pr) > D{po\\p) > 0, (3.11) 

or simply, using the results of the previous subsection, 

A^+{Q)^>D(po||p)>0. (3.12) 

This is a stronger form of the ordinary Second Law. It states that the change in 
entropy is bounded from below by an information-theoretic quantity that characterizes 
the difference between the initial distribution po and a cycled distribution p that has 
been evolved forward and backward in time. 

In the context of a numerical simulation, it is easier to calculate Zl(po||p) than 
D{Pf\\Pp), since the former only depends on knowing the probability distribution of 
the system at two specihed points in time. D{po\\p) can readily be calculated by evolv¬ 
ing the distribution according to the forward and reverse protocols. This is in contrast 
with D{Pf\\ Pr), the computation of which requires knowledge of joint probability dis¬ 
tributions. Obtaining the joint distributions is more difficult, because one must know 
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how each microstate at the given initial time relates to the microstates of the future 
time. This bound therefore provides an easily-calculable contraint on the full behavior 
of the system. 

Monotonicity of the relative entropy also allows us to succinctly state the rela¬ 
tionship between the path-space and endpoint-space formulations of the Second Law. 
Indeed, the relationship between the probabilities Pp[a;(t)] and Pf{x,x') is 

Px{t)=x' 

Pf{x,x') = / 'Dx{t) PF[x{t)], (3.13) 

J a::(0)=a: 

with a similar relationship between the reversed quantities. Monotonicity of relative 
entropy then implies that 

D{PF[xmPRHt)]) > D{Pf{x,x')\\Pr{x,x')). (3.14) 

Since the changes in entropy are the same, this inequality reduces to the relationship 
{Q[x(t)])p > {Q{x —)■ x'))p between the expected heat transfer and the expected coarse¬ 
grained heat transfer, which can also be shown directly with a convexity argument. 
The point here is that the path-space and endpoint-space formulations of the ordinary 
Second Law (as well as the Bayesian Second Law in the following section) are not 
independent of each other. Endpoint-space is simply a coarse-grained version of path- 
space, and the monotonicity of relative entropy tells us how the Second Law behaves 
with respect to coarse-graining. 

4 The Bayesian Second Law 

Now we are ready to include Bayesian updates. It is an obvious extension of the 
discussion above to consider the relative entropy of the updated joint probabilities 
Pp|m and PR\m, which is again non-negative: 

D{PF\m\\PR\m) > 0. (4.1) 

This is the most compact form of the Bayesian Second Law (BSL). 

4.1 Cross-Entropy Formulation of the BSL 

It will be convenient to expand the dehnition of relative entropy in several different 
ways. First, we can unpack the relative entropy to facilitate comparison with the 
ordinary Second Law: 

D{PF\m\\PR\7n) = j dx Po\ni{x) \og po{x) - j dx' pr\m{x') log Pr{x') + {Q) F\m ■ (4-2) 
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Here we have used the expressions (2.13) and (2.18) for the joint distributions, as well 
as the identity (2.19). We have also extracted the generalized heat term, 


{Q)p\^= f dxdx' PF\m{.x, x') log ^ ( 4 . 3 ) 

J '^r[X —t X) 

which is the expected transfer of generalized heat out of the system during the forward 
experiment given the hnal measurement outcome. This is an experimentally measurable 
quantity in thermodynamic setups: the heat transfer is measured during each trial 
of the experiment, and {Q)p^^ is the average over the subset of trials for which the 
measurement outcome was m. The remaining two terms are not identihable with a 
change in entropy, but we have a couple of options for interpreting them. 

The form of (4.2) naturally suggests use of the cross entropy between two distri¬ 
butions, defined as 

= ~ J dx p{x)logq{x). (4.4) 

(Note that this is not the joint entropy, dehned for a joint probability distribution p{x, y) 
as — / dxdyp{x,y) logp{x,y).) Using this dehnition, the relative entropy between the 
updated joint distributions (4.2) may be rewritten in the form, 

DiPF\m\\PR\m) -^(PrlrruPr) -^(Po|m! Po) T {Q)F\m ' 

The Bayesian Second Law is then 

AH{pm, p) + {Q)p\m > 0- (4-6) 

Here, A is the difference in the values of a quantity evaluated at the hnal time r and 
the initial time 0. 

To get some intuition for how to interpret this form of the BSL, it is useful to 
recall the information-theoretic meaning of the entropy and cross entropy. Given a 
probability distribution p{x) over the set of microstates x in a phase space T, we can 
dehne the self-information (or Shannon information, or “surprisal”) associated with 
each state. 

The self-information measures the information we would gain by learning the identity 
of the specihc microstate x. If x is highly probable, it’s not that surprising to hnd the 
system in that state, and we don’t learn that much by identifying it; if it’s improbable 
we have learned a great deal. From this perspective, the entropy S{p) = J dxp{x)Ip{x) 
is the expectation value, with respect to p{x), of the self-information associated with 
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p{x) itself. It is how much we are likely to learn, on average, by hnding out the actual 
microstate of the system. In a distribution that is highly peaked in some region, the 
microstate is most likely to be in that region, and we don’t learn much by hnding it out; 
such a distribution has a correspondingly low entropy. In a more uniform distribution, 
we always learn something by hnding out the specihc microstate, and the distribution 
has a correspondingly higher entropy. 

In contrast, the cross entropy H{p, q) = f dxp{x)Iq{x) is the expectation value with 
respect to p{x) of the self-information associated with q{x). Typically p{x) is thought 
of as the “true” or “correct” distribution, and q{x) as the “assumed” or “wrong” 
distribution. We believe that the probability distribution is given by q{x), when it 
is actually given by p{x). The cross entropy is therefore a measure of how likely we 
are to be surprised (and therefore learn something) if we were to be told the actual 
microstate of the system, given that we might not be using the correct probability 
distribution. The cross entropy is large when the two distributions are peaked, but in 
different places; that maximizes the chance of having a large actual probability p{x) 
for a state with a large self-information Iq{x). When the two distributions differ, we 
are faced with two distinct sources of uncertainty about the true state of the system: 
the fact that there can be uncertainty in the true distribution, and the fact that we are 
working with an assumed distribution rather than the true one. Mathematically, this 
is reflected in the cross entropy being equal to the entropy of the true distribution plus 
the relative entropy: 

H{p,q) = S{p) + D{p\\q). (4.8) 

The cross entropy is always greater than the entropy of the true distribution (by positiv¬ 
ity of relative entropy), and reduces to the ordinary entropy when the two distributions 
are the same. 

The Bayesian Second Law, then, is the statement that the cross entropy of the 
updated (“true”) distribution with respect to the original (“wrong”) distribution, plus 
the generalized heat flow, is larger when evaluated at the end of the experiment than 
at the beginning. In other words, for zero heat transfer, the expected amount of 
information an observer using the original distribution function would learn by being 
told the true microstate of the system, conditioned on an observation at the hnal time, 
is larger at the final time than at the initial one. 

We note that the quantity H{pt\m,Pt) only has operational meaning once a mea¬ 
surement has occurred, since performing the Bayesian update to take the measurement 
into account requires knowledge of the actual measurement outcome. The BSL is a 
statement about how much an experimenter who knows the measurement outcome 
would expect someone who didn’t know the outcome to learn by being told the mi- 
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crostate of the system. There is therefore not any sense in which one can interpret an 
increase of H{pt\m,pt) with increasing t as an increase in a dynamical quantity. This 
is in contrast with the dynamical interpretation of the monotonic increase in entropy 
over time in the ordinary Second Law. It is, in fact, the case that H{pt\rn,Pt) does in¬ 
crease with increasing t for zero heat transfer, but this increase can only be calculated 
retroactively once the measurement has actually been made. Of course, in the case 
of a trivial measurement that tells us nothing about the system, the BSL manifestly 
reduces to the ordinary Second Law, since H{p,p) = S{p). 

4.2 Alternate Formulations of the BSL 

Another natural quantity to extract is the total change in entropy after the two-step 
process of time evolution and Bayesian updating, which we will call ASm- 

AS^ = S{pr\m)-S{po). (4.9) 

This is the actual change in the entropy over the course of the experiment in the mind 
of the experimenter, who initially believes the distribution is po (before the experiment 
begins) and ultimately believes it to be Pr|m- In terms of this change in entropy, we 
have 

D{PF\m\\PR\m) = ASm+ {Q) F\m +^ iPrlmWPr) + J dx {polrn{x) - Po{x)) \og po{x). (4.10) 

The second to last term, D^pT-^mllPr), is the relative entropy of the posterior distribution 
at time r with respect to the prior distribution; it can be thought of as the amount of 
information one gains about the hnal probability distribution due to the measurement 
outcome. This is a natural quantity in Bayesian analysis, called simply the information 
gain [22]; maximizing its expected value (and hence the expected information learned 
from a measurement) is the goal of Bayesian experimental design [23]. Because it 
measures information gained, it tends to be largest when the measurement outcome m 
was an unlikely one from the point of view of pr- The hnal term exactly vanishes in the 
special case where the initial probability distribution is constant on its domain, which 
is an important special case we will consider in more detail below. 

Using (4.10), the positivity of relative entropy is equivalent to 

AS^+ {Q)F\m > -DiPrlmWPr) + j dx {po{x) - Po\^{x)) \og Po{x) . (4.11) 

The left-hand side of this in equality is similar to that of the ordinary Second Law, 
except that the result of the measurement is accounted for. In the event of an unlikely 
measurement, we would intuitively expect that it should be allowed to be negative. 
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Accordingly, on the right-hand side we hnd that it is bounded from below by a quantity 
that can take on negative values. And indeed, the more unlikely the measurement is, 
the greater D{pT-\rn\\pT) is, and thus the more the entropy is allowed to decrease. 

Finally, we can expand the relative entropy in terms of S'(po|m) instead of *S'(po)- 
That is, we dehne the change in entropy between the initial and hnal updated distri¬ 
butions, 

AS'(pm) = S{pr\m) - S{pQ\m)- (4.12) 

(Note the distinction between AS{pm) here and ASm in (4.9).) This is the answer to 
the question, “Given the hnal measurement, how much has the entropy of the system 
changed?” Then (4.11) is equivalent to 


AS{pm) + > T>(po|m||po) - D{pr\„^\\pr). (4.13) 


This change of entropy can be contrasted with S{pr\m) — S{po), which is a statement 
about the change in the experimenter’s knowledge of the system before and after the 
measurement is performed. 

The right hand side of (4.13) has the interesting property that it is always less than 
or equal to zero. This can be shown by taking the difference of the relative entropies 
and expressing it in the form 


?^(PO|m||po) -^(PTlmllPr) 




,Pq{x)71f{x —>■ x')P{m\x') 
P{m) 


log 


np^x —)■ m) 
P{m\x') 


( 4 . 14 ) 


We have dehned tif^x m) = J (ix'^p^x —)■ x')P{m\x') for convenience. It is only 
possible to write the difference in this form because the initial and hnal distributions 
are related by evolution (2.14). Using the concavity of the logarithm, it can then be 
shown that this quantity is non-positive. 

One hnal point of interest in regards to (4.13) is its average with respect to mea¬ 
surement outcomes. The inequality is predicated on a specihc measurement outcome, 
m; averaging with respect to the probability of obtaining a given measurement, we hnd 


{AS{pm)) + (Q) > /(Xo; M) - M) (4.15) 

where I{Xt]M) is the mutual information between the microstate of the system at 
time t and the measurement outcome. Here the mutual information can be expressed 
as the relative entropy of a joint probability distribution to the product of its marginal 
distributions, I{x;m) = D{p{x,m)\\p{x)p{m)). 

Inequalities similar to (4.15) can be found in the existing literature for nonequi¬ 
librium feedback-control, though they are usually written in terms of work and free 
energy instead of entropy [6, 24-27]. The novelty of (4.15) stems from the fact that 
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no explicit feedback-control is performed after the measurement and the presence of 
the term /(Xq; M). This term is the mutual information between the initial microstate 
of the system and the measurement outcome and arises because Bayesian updating is 
performed at the initial time as well as the hnal time. Due to this updating, the lower 
bound on the entropy production is greater than one would naively suspect without 
updating the initial state. 

4.3 A Refined BSL from Monotonicity of Relative Entropy 

So far we have rewritten the relative entropy of the forward and reverse distributions 
(4.1) in various ways, but there is a rehned version of the BSL that we can formulate 
using monotonicity of relative entropy, analogous to the rehned version of the ordinary 
Second Law we derived in Section 3.2. Following the dehnition of the cycled distribution 
p in (3.10), we can dehne an updated cycled distribution by marginalizing the updated 
reverse distribution over initial states, 

Pm{x) = j dx' PR\m{x,x') = j dx ppmix')^: r{x' -)■ x). (4.16) 

The monotonicity of relative entropy then implies that 

D{PF\m\\PR\m) > D(po|m||Pm)- (4-17) 

This is the rehned Bayesian Second Law of Thermodynamics in its most compact form, 
analogous to the rehned Second Law (3.12). 

Expanding the dehnitions as above, the rehned BSL can be written as 

AH{p^,p) + {Q)f\^ > D(po|m||Pm), (4.18) 

or equivalently as 

ASm+ {Q)F\m> ^iPolmWPm) - D^p^lrnWPr) + J dx {po{x) - Polm{x)) \og po{x) . (4.19) 

From the form of (4.18), we see that the change in the cross entropy obeys a tighter 
bound than simple positivity, as long as the cycled distribution deviates from the 
original distribution (which it will if the evolution is irreversible). 

Other versions of the Second Law can be obtained from the relative entropy by 
inserting diherent combinations of Ppim, PR\m, Pp, and Pr. We have chosen to highlight 
D{Pf\\Pr) and D{PF\m\\PR\m) because these are the combinations which we can expect 
to vanish in the case of perfect reversibility, and thus characterize the time-asymmetry 
of the dynamics. Other possibilities, like D{PF\m\\PR), are always nonzero as long as 
information is gained from the measurement. 
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5 Bayesian Jarzynski Equalities 


The Jarzynski equality [1] relates the expectation value of the work done on a system 
along paths connecting two equilibrium states (the paths themselves can involve non- 
equilibrium processes). Consider two equilibrium states A and B, with Helmholtz free 
energies Fa and Fb, and dehne AF = Fb — Fa- If the work done as the system evolves 
from H to iJ is denoted by W, the Jarzynski equality states 

<e-^) = e-^^. (5.1) 


(As usual we are setting Boltzmann’s constant kB and the inverse temperature /3 equal 
to unity.) This equality represents a non-equilibrium relationship between the set of 
all paths between two states and their respective free energies. It can be derived from 
the Crooks Fluctuation Theorem [3], which can be written as 


P[A —»■ B) _ W-AF 
P{B ^A)~ 


(5.2) 


where P{A — B) and P{B —)■ A) are respectively the forward and reverse probabilities 
between these equilbrium states. In turn, (5.1) immediately implies the Second Law via 
Jensen’s inequality, (e’”) > given that AS = W — AF between equilibrium states. 
In this section we show how to derive a number of equalities involving expectation 
values of quotients of forward and reverse probabilities, with and without Bayesian 
updates. For simplicity we will refer to such relations as “Jarzynski equalities.” 

Recall the simple identity (2.19): 


Pnix^x') ^ PR\m{x,x') ^ 2 ^ 

Pf{x,x') PF\ni{x,x') po{x) 

which we have made use of in previous sections. We can obtain a Jarzynski equality by 
computing the expectation value of this ratio with respect to Pp (or Pp|m)- Naively, 
one would multiply by Pp and hnd PpPp/Pp = Pp, but we need to keep track of the 
domain of integration: we are only interested in points where Pp ^ 0 (Pp|m 7^ 0) when 
computing an average with respect to Pp (Pp|m)- So we have, for instance. 


/^\ = [ dxdx' Pr{x,x'). (5.4) 

\^F / F JPf^O 

This integral will be equal to one unless there is a set of zero Pp-measure with nonzero 
Pp-measure. On such a set, the ratio Pp/Pp diverges. Generically this will include all 
points where Pq{x) vanishes, unless Q happens to diverge for some choices of x' {e.g., 
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one reason for Pr to vanish is that certain transitions are strictly irreversible). Note 
that if Pq{x) is nowhere zero and Q does not ever diverge (as in physically relevant 
situations), then this integral is equal to one. This is true no matter how small po{x) 
is or how large Q, as long as they are nonzero and hnite everywhere, respectively. For 
this reason, (5.4) generically is equal to one. 

The same reasoning holds for the updated probabilities: 



Since the ratio PR\mlPF\m is identical to the ratio Pr/Pf, the condition for this integral 
to equal one is the same as the previous integral, which means it is generically so. 

To summarize, we have constants a, such that 


Pr \ ^ . 


R\m 


Pi 


F\m 


= hrr,. < 1. 


F\m 


(5.6) 


By perturbing the initial state by an arbitrarily small amount, we can make Pr/Pr 
hnite everywhere (excluding divergences in Q), and so a 7 ^ 1 and 6 m 7 ^ 1 are in some 
sense unstable. As with the usual Jarzynski equality, we can use Jensen’s inequality 
on each of these to extract a Second Law: 


i’(CFimi) > -loga > 0 
r^(^F|mim;|in) ^ log ^ 0- 


(5.7) 

(5.8) 


Thus these Jarzynski equalities contain within them the positivity of relative entropy. 

There are also Jarzynski equalities corresponding to the monotonicity inequalities. 
Consider 


= /■ d^dx' 

Pf P / F Jpf^o J dy' PrK^^V 


J^poix) < 1 . 


Applying Jensen’s inequality reproduces the monotonicity result: 


(5.9) 


0(/’Fl|i’«) > D(po\\p). 


(5.10) 


The rehned Bayesian Second Law follows similarly from the Jarzynski equality. 


PR\m Po\m 
PF\m pm / ^1 


Cm. 1 . 


(5.11) 


While we have derived a series of Jarzynski equalities, it is not apparent that these 
share the mathematical form of the original Jarzynski equality, (5.1); specihcally, we 
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would like to make contact with an exponential average of thermodynamic quantities. 
Making use of the identity (5.3), we may write 




= a. 


(5.12) 


.Pf / F 

A similar equality may be derived making use of the thermodynamic heat, Q, 
instead of the coarse-grained heat, Q; however, in doing so, one must introduce an 
average over paths. We note that (5.12) takes the form of an exponential of a difference 
in self-informations, (4.7), and heat transfer. For initial and hnal macrostates where 
the self-information of each microstate is equal to the macrostate’s entropy, this reduces 
to Jarzynski’s original equality. As such, (5.12) is just a restatement of the Jarzynski 
equality generalized for non-equilibrium states. 

In a similar manner, we also hnd: 


qog Pr{x')-\og poix)-l3Q{x^x') 


F\m 


— br, 


(5.13) 


yogpT(x')-log po{x)-l3Q{x-^x')+log[pf)^^{x) / p, 


/ F\r 




(5.14) 


We see that (5.13) generalizes the Jarzynski equality to include Bayesian updating 
while (5.14) also includes the information loss from the stochastic evolution of the sys¬ 
tem. Importantly, (5.13) and (5.14) hold independently for each possible measurement 
outcome. Essentially, if we partition a large set of experimental trials based on mea¬ 
surement outcomes, each subset obeys its own Jarzynski equality, (5.13). However, if 
we consider all experimental trials together the Jarzynski equality (5.12) holds. This 
leads us to the relation 




(5.15) 


6 Applications 

As a way to build some intuition for the Bayesian point of view we have been discussing, 
we will go through a few simple examples and special cases. 

6.1 Special Cases 

Perfect Complete Measurement. If a measurement does not yield any no new 
information, then the updated probabilities are identical to the prior probabilities and 
the Bayesian Second Law reduces to the ordinary Second Law. On the other hand, con¬ 
sider a measuring device that is able to tell us with certainty what the exact microstate 
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of the system is at the time of measurement. The outcome m of the experiment is 
then a single point in phase space. If we employ such a device, we have the following 
simplihed expressions: 


Po(a;)7ri;’(a; m) 

= p.(m) ' 

( 6 , 1 ) 

PT\m{x') = 5{x' -m), 

( 6 . 2 ) 

T^F\m.{.x — )■ x') = 5{x' — m)9{nF{x — )■ m)). 

(6.3) 

Pm{x) = nnim -)■ x). 

(6.4) 

Using these simplihcations, we hnd 


f7(7^F|m II 7^i?|m) 71(pQ|jfj||p^), 

(6.5) 


so the rehned Bayesian Second Law is always satnrated. This is becanse marginalization 
of the joint distribution over the hnal endpoint resnlts in no loss of information: we are 
still conditioning on the measnrement outcome m, which tells us the hnal endpoint. 


The Boltzmann Second Law of Thermodynamics. In the Boltzmann formn- 
lation of the Second Law, phase space is partitioned into a set of macrostates. Each 
microstate is assigned to a macrostate; the entropy of a microstate x is dehned as the 
entropy of its associated macrostate S(a;), which is the logarithm of the macrostate’s 
phase space volume |E|. We can reproduce this formulation as a special case of the 
Bayesian measnrement formalism: the measuring device determines which macrostate 
the microstate belongs to with absolute certainty. If the measurement outcome m 
indicates that the system is in some particular macrostate (but doesn’t include any 
additional information), we have 

P{m\x) = l^{x) = ^ (6.6) 

if a; ^ m. 

We also choose our initial distribution to be nniform over an initial macrostate Sq: 

Po{x) = ^lso(a^)- (6.7) 

1^01 

Then we have the identities 

(-logpo(a^))F|m = logl^ol = S{po), 

{-'^OgPr{x))p^^ = - J dx Pr\m{x) \og Pr\m{x) + j dxpr\ni{x) 

Si^Prlm) T 7^ (Pr|m II Pr) • 
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( 6 . 8 ) 

(6.9) 

( 6 . 10 ) 



Then the rehned Bayesian Second Law (4.19) simplihes to 

^Sm + {Q)F\m ^ ^(P0|m||Pm) “ (Pr|m||Pr)• (6.11) 

The left-hand side of this inequality is not quite the same as in the Boltzmann 
formulation, because S{pr\m) is not the entropy associated with any of the previously 
established macrostates. But we do have the inequality S{pT-\m) < log \m\, which is the 
entropy of the hnal macrostate. So the left-hand side of (6.11) can be replaced by the 
usual left-hand side of the Boltzmann Second Law while maintaining the inequality.^ 
The right-hand side of the Boltzmann Second Law is zero, while in (6.11) we have 
the difference of two positive terms. The Boltzmann Second Law can be violated by 
rare fluctuations, and here we are able to characterize such fluctuations by the fact that 
they render the right-hand side of our inequality negative. We can also give an explicit 
formula for the term D{pr\m\\pT) that comes in with a minus sign: 

^(Pr|m||Pr) = -log / dx' pr{x') = - log P{m) = Im, (6.12) 

J m 

where is the self-information associated with the measurement outcome m. When 
the observed measurement is very surprising, the entropy change has the opportunity to 
become negative. This gives quantitative meaning to the idea that we gain information 
when we observe rare fluctuations to lower-entropy states. In particular, the entropy 
change may be negative if the information gain from the measurement is greater than 
the information loss due to irreversible dynamics. 


6.2 Diffusion of a Gaussian in n Dimensions. 


As our hnal analytic example, we consider a dynamical model that can be solved 
analytically. Let the conhguration space be M”, and suppose the time evolution of the 
probability density is diffusive. That is. 


Pt{x') = / d^x- 


:e '"20 J po(a;). 


{2'kDt)'^P 

Then we can identify the transition function with the heat kernel; 


np^x —)■ x') = 


1 _ \x — x'p 

e 2 Dt , 


{27iDt)'^P 


(6.13) 


(6.14) 


We will assume for simplicity that the diffusion is unaffected by time reversal, so that 
= t^r = and that the states x are also unaffected by time reversal. (Alternatively, 

Wnd, as we have discussed previously, the coarse-grained Q can be replaced by the path-space Q 
as well. 
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we can assume that time-reversal is some sort of reflection in x. The distributions we 
consider will be spherically symmetric, and hence invariant under such reflections.) 
Note that since —)■ x') = ti{x' —)■ x), this implies Q = 0. We will analyze the 
system without including measurement, again for simplicity, and we will also assume 
that the initial density prohle is Gaussian with initial width a. Diffusion causes the 
Gaussian to spread: 


Pr{x) 


1 

- p 2(<t + Dt) 

+ Dr))”G 


(6.15) 


We can also calculate the entropy as a function of time: 


S{t) = / d^x 


{27r{(7 -|- 


g 2((T + Dr) 




Tl Tl 

= 2 + 2 


(6.16) 

(6.17) 


Therefore we have AS* = | log(l -|- ^). The relative entropy D(po||p) is also easy to 
calculate, since in this case p = P 2 t- 


^(poIIp) 


n 

2 




2Dt 
a + 2 Dt 


(6.18) 


The rehned Second Law from monotonicity of the relative entropy says that AS > 
D{po\\p). Let us see how strong this is compared to AS* > 0. For small r, we have 
-D(PoIIp) ~ n(Dr/(T)^, as compared to AS ~ nDT/2a. So the bound from monotonicity 
is subleading in r, so perhaps not so important. For large r, though, we have D(po||p) ~ 
I [log ^ — log |], as compared to AS ~ | log Now the bound is fairly tight, with 
the relative entropy matching the leading behavior of AS. 


6.3 Randomly Driven Harmonic Oscillator 

As a slightly more detailed - and potentially experimentally realizable - example to 
which we can apply the Bayesian Second Law, we consider the harmonic oscillator. 
Imagine a single, massive particle conhned to a one-dimensional harmonic potential, 
with spring constant and potential minimum treated as time-dependent control param¬ 
eters, coupled to a heat bath which generates dissipative and fluctuating forces. Such 
a system may be described by the Fokker-Planck equation. 


dp{x,p,t) 

dt 


yp{x, p, t) + [k{t) [x-z (t)] + 

p dp{x,p,t) 2M d‘^p{x,p,t) 

M dx /3 t^ (9p2 


(6.19) 
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Here we have defined r* to be the dissipation time-scale, k{t) to be the spring constant, 
z{t) to be the location of the potential’s minimnm, M to be the mass of the oscillator, 
and to be the inverse temperatnre of the heat bath. For simplicity, we choose to 
work in nnits natnral for this system by taking f3 = 1, M = 1, and = 0) = 1. We 
also choose r* = 1, so that we are in the interesting regime where the dissipation and 
oscillation time scales are comparable. 

We assnme that the experimenter is only capable of measnring the position of the 
particle and not its momentnm. For a microstate with position x, we assnme that 
P{m\x) is given by a Gaussian distribution in m centered at x with standard deviation 
a = 0.2. This means that the experimenter is likely to find a measured value m within 
a range ±0.2 of the true position x. This measuring device is therefore quite sensitive 
when compared to the typical size of thermal fluctuations, which is of order unity. 

There is no analytical solution to (6.19) in the regime of interest, so the system 
must be modeled numerically. This can be done by discretizing phase space on a lattice 
and using finite-difference methods to evolve the discrete probability distribution. We 
have performed this process using the finite element solver package FiPy [28] for the 
Python programming language. To elucidate different aspects of the BSL, we consider 
three different simulated experiments. The phase space evolution of these experiments 
is shown in Figures 3-5, found in Appendix A, while the thermodynamic quantities 
calculated are tabulated in Table 2. The source code which was used to carry out these 
simulations and animations of the evolution are also available.'^ 

We first consider the simple experiment shown in Figure 3. The system begins 
in thermal equilibrium. Figure 3a. The experiment is carried out under a “trivial” 
protocol, where the experimenter fixes k{t) = 1 and z{t) =0. Under this protocol, the 
system is allowed to evolve from f = 0 to f = 1 before a measurement is performed. 
As seen in Figure 3b, the thermal distribution is nearly unchanged by this evolution. 
(Due to finite-size effects, the thermal distribution is not perfectly stationary.) At the 
end of the experiment, a measurement of the position is made and we assume that the 
unlikely fluctuation m = 2 is observed. The experimenter can then use this information 
to perform a Bayesian update on both the initial and final distributions as shown in 
Figures 3d and 3e. To evaluate the irreversibility of this experiment, the experimenter 
must also examine the time-reversed process. The updated cycled distribution which 
results from evolving under the time-reversed protocol is shown in Figure 3f. 

While this experiment and its protocol are fairly simple, they illustrate several key 
features of the Bayesian Second Law. Before the final measurement is performed, the 
experimenter would state that AS = 0.07. After performing the measurement, this 

^See: http://preposterousuniverse.com/science/BSL/ 
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Figure 3 

Figure 4 

Figure 5 

S{po) 

2.84 

0.31 

0.31 

S{Pr) 

2.91 

2.93 

2.96 

AS 

0.07 

2.61 

2.65 

{Q)f 

-0.04 

5.99 

7.99 

AS+{Q)p 

0.02 

8.61 

10.64 

D{po\\p) 

0.01 

7.68 

10.64 

S{po\m) 

2.47 

-0.43 

0.31 

S{pT\m) 

1.23 

1.12 

1.23 

ASm 

-1.61 

0.81 

0.92 

-0(PO|mi|Po) 

1.01 

0.70 

< 0.01 

f^(Pr|mi|PT) 

2.71 

1.37 

1.24 

''A 

cx 

S 

o 

3.48 

0.26 

0.31 

H(^pr\m-) Pr) 

3.94 

2.49 

2.47 

AH 

0.46 

2.23 

2.16 

( 2) F\m 

-0.40 

6.14 

8.47 


0.06 

8.36 

10.64 

^{P0\m\\Pm) 

0.04 

7.65 

10.63 

LHS of Eqn 4.19 

-2.01 

6.94 

9.39 

RHS of Eqn 4.19 

-2.03 

6.235 

9.39 

1 LHS-RHS 1 

1 LHS 1 

< 0.01 

0.10 

< 0.01 

(ft). 

1.00 

1.00 

1.00 

/ Pr Po\ 

\Pf P / p 

1.00 

1.00 

1.00 

/ ^R\m \ 

\ Pplm / p^m 

1.00 

1.00 

1.00 

/ ^R\m P0\m \ 

\PF\m pm / p^^ 

1.00 

1.00 

1.00 


Table 2: List of thermodynamic properties calculated for three numerically simulated 
experiments. 

becomes ASm = —1.61 with a heat transfer of {Q)p^^ = —0.40. Naively using these 
updated quantities in (3.7) leads to an apparent violation of the usual Second Law of 
Thermodynamics. However, this is remedied when one properly takes into account the 
information gained as a result of the measurement. A more careful analysis then shows 
AH = 0.46 and D{po\m\pm) = 0.04. As such, we see that (4.18) is satished and that 
the inequality is very tight. 
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We will now consider the same (trivial) protocol with a different initial distribntion. 
The experimenter knows the initial position of the oscillator and the magnitude, but 
not the direction, of its initial momentum with a high degree of certainty. As such, 
there are two regions of phase space the experimenter believes the system could be in. 
The initial distribution is shown in Figure 4a. The system is then allowed to evolve 
until t = 0.5 as shown in Figure 4b. At the end of the experiment, the position of the 
oscillator is measured to be m = 2. The impact of this measurement can be seen in 
Figures 4d and 4e. 

Due to the outcome of the measurement, the experimenter is nearly certain that 
the oscillator had positive initial momentum. One therefore expects this information 
gain to be roughly one bit and this is confirmed by D(po|m||po) = 0.70 ~ log 2. Despite 
this sizable information gain for the initial distribution, we note that the information 
gain for the final distribution is even greater with D(p.r|m||Pr) = 1.37. This is expected 
because, regardless of the measurement outcome, the experimenter will always gain at 
least as much information about the final distribution than the initial when performing 
a measurement. Evaluating the remaining terms, see Table 2, we once again find that 
the BSL is satisfied. 

Lastly, consider an experiment that starts with the same initial state but uses a 
non-trivial protocol where the potential is “dragged”. The experimenter keeps k{t) = 1 
fixed but varies z{t). For times between f = 0 and t = 1, the experimenter rapidly 
drags the system according to z{t < 1) = 2t. After this rapid dragging motion, the 
experimenter keeps z{t > 1) = 2 and allows the system to approach equilibrium until 
a measurement performed at f = 5. Importantly, this gives the system a significant 
amount of time to reach its new equilibrium distribution before the measurement is 
performed. The experimenter then measures the oscillator’s position and finds it to be 
centered in the new potential (m = 2). The evolution of this system is shown in Figure 
5. 

Due to the change in protocol, the experimenter gains an appreciable amount 
of information about the final distribution of the system, but negligible information 
about the initial distribution. Specifically, we find that D(pT-|m||pr) = 1.24, while 
D{po\m\\po) < 0.01. This is because the system is given time to fully thermalize before 
the measurement, so any information about the initial state is lost by the time the 
measurement is performed. Also of interest is the difference between the forward and 
reverse protocol. As shown in Figures 5a and 5b, the forward protocol results in most 
distributions reaching the new thermal equilibrium. However, the same is not true of 
the reverse protocol: the distributions in Figures 5c and 5f are not near equilibrium. 
This is due to the asymmetry between the forward and reverse protocols. 

We also calculated the quantities appearing in the Bayesian Jarzynski equalities 
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derived in Section 5; they appear in Table 2. We find that for all three experimental 
protocols considered, these are well defined and equal to unity. 


7 Discussion 


We have shown how to include explicit Bayesian updates due to measurement outcomes 


into the evolution of probability distributions obeying stochastic equations of motion, 
and derived extensions of the Second Law of Thermodynamics that incorporate such 
updates. Our main result is the Bayesian Second Law, which can be written in various 
equivalent forms (4.1), (4.6), (4.11), (4.13): 


D{PF\m\\PR\m) > 0 , 

^H{pjnip) + (Q)^I^ > 0, 


(7.1) 

(7.2) 

(7.3) 

(7.4) 


AS'^T (Q)^i^ > -D{pr\yn\\pT) + j dx {po{x) - Po\m{x)) \og Po{x), 
^S{pm) + {Q)P -D(Po|m||Po) ~ D^Pp^uiWPt). 


We also used monotonicity of the relative entropy to derive refined versions of the 
ordinary Second Law and the BSL, (3.12) and (4.18): 


AS+{Q)p>D{po\\p)>0, 
AHi^p^^p') + (Q)p|^ ^ Ll(po|m||Pm) ^ 0. 


(7.5) 

(7.6) 


Finally, we applied similar reasoning to obtain Bayesian versions of the Jarzynski equal¬ 
ity, such as (5.6): 



(7.7) 


In the remainder of this section we briefly discuss some implications of these results. 

Downward fluctuations in entropy. As mentioned in the Introduction, there is 
a tension between a Gibbs/Shannon information-theoretic understanding of entropy 
and the informal idea that there are rare fluctuations in which entropy decreases. The 
latter phenomenon is readily accommodated by a Boltzmannian definition of entropy 
using coarse-graining into microstates, but it is often more convenient to work with 
distribution functions p{x) on phase space, in terms of which the entropy of a system 
with zero heat flow will either increase or remain constant. 

The BSL resolves this tension. The post-measurement entropy of the updated 
distribution pp^ can be less than the original starting entropy po, as the right-hand 
side of (7.3) can be negative. On the rare occasions when that happens, there is still a 
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lower bound on their difference. From the information-theoretic perspective, downward 
fluctuations in entropy at zero heat flow are necessarily associated with measurements. 

This perspective is also clear from the rehned Bayesian version of the Boltzmann 
Second Law (6.11), in which the right-hand side can be of either sign. We can see 
that downward fluctuations in entropy at zero heat flow occur when the amount of 
information gained by the experimenter exceeds the amount of information lost due to 
irreversible dynamics. 

The usefulness of the BSL is not restricted to situations in which literal observers 
are making measurements of the system. We might be interested in fluctuating biolog¬ 
ical or nanoscale systems in which a particular process of interest necessarily involves a 
downward fluctuation in entropy. In such cases, even if there are no observers around to 
witness the fluctuation, we may still be interested in conditioning on histories in which 
such fluctuations occur, and asking questions about the evolution of entropy along the 
way. The BSL can be of use whenever we care about evolution conditioned on certain 
measurement outcomes. 

The Bayesian arrow of time. Shalizi [29] has previously considered the evolu¬ 
tion of conservative systems with Bayesian updates. For a closed, reversible system, 
the Shannon entropy remains constant over time, as the distribution evolves in accor¬ 
dance with Liouville’s Theorem. If we occasionally observe the system and use Bayes’s 
rule to update the distribution, our measurements will typically cause the entropy to 
decrease, because conditioning reduces entropy when averaged over measurement out¬ 
comes, {S{pm))m < S{p). At face value, one might wonder about an apparent conflict 
between this fact and the traditional understanding of the arrow of time, which is based 
on entropy increasing over time. This should be a minor effect in realistic situations, 
where systems are typically open and ordinary entropy increase is likely to swamp any 
decrease due to conditioning, but it seems like a puzzling matter of principle. 

Our analysis suggests a different way of addressing such situations: upon making 
a measurement, we can update not only the current distribution function, but the 
distribution function at all previous times as well. As indicated by (7.4), the entropy 
of the updated distribution can decrease even at zero heat transfer. We have identihed, 
however, a different quantity, the cross entropy H{pm,p) of the updated distribution 
with respect to the unupdated one, which has the desired property of never decreasing 
(7.2). For a closed system, both the updated entropy and the cross entropy will remain 
constant; for open systems the cross entropy will increase. It is possible to learn about 
a system by making measurements, but we will always know as much or more about 
systems in the past than we do about them in the present. 
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Statistical physics of self-replication. The application of statistical mechanics 
to the physics of self-replicating biological systems by England [7] was one of the in¬ 
spirations for this work. England considers the evolution of a system from an initial 
macrostate, I, to a hnal macrostate, II, and hnds an inequality which bounds from be¬ 
low the sum of the heat production and change in entropy by a quantity related to the 
transition probabilities between the two macrostates. This inequality, however, does 
not explicitly make use of a Bayesian update based on the observation of the system’s 
hnal macrostate: as we have seen previously, the inclusion of Bayesian updates can 
signihcantly change one’s interpretation of the entropy production. 

In seeking to interpret England’s inequality within our framework, we consider the 
form of the BSL in an experiment where the initial distribution has support only on 
the initial macrostate, and the measurement at the conclusion determines the hnal 
macrostate. This is a slight generalization of the Boltzmann setup considered in Sec¬ 
tion 6.1 above. We then have the option to consider the diherence between the entropy 
of the updated hnal distribution and the entropy of either the updated or unupdated 
initial distribution. 

First, making use of the unupdated initial state, it can be shown that 

S(Px|n) - S(p„) + (S),.|„ > - log + S(po|n) - S(p„). (7.8) 

This inequality is similar in spirit to England’s: when S'(po|ii) > S{po), England’s 
inequality immediately follows. Alternatively, using the updated initial state, we hnd 

*S'(Pr|Il)-*S'(po|Il) + (Q)i7|II > -D(po|Il||Pll)+-D(po|Il||Po)--D(Pr|Il||Pr) > ~ 7]-(I -)■ II) ’ 

(7.9) 

This dihers from England’s result only in that the entropy of the initial state has 
been replaced by the entropy of the updated initial state. Making this adjustment to 
England’s inequality, we recover his bound from the bound given by the BSL. (We 
thank Timothy Maxwell for proving this relation.) 

Future directions. In this paper we have concentrated on incorporating Bayesian 
updates into the basic formalism of statistical mechanics, but a number of generaliza¬ 
tions and applications present themselves as directions for future research. Potential 
examples include optimization of work-extraction (so-called “Maxwell’s demon” exper¬ 
iments) and cooling in nanoscale systems, as well as possible applications to biological 
systems. It would be interesting to experimentally test the rehned versions of the or¬ 
dinary and Bayesian Second Laws, to quantify how close the inequalities are to being 
saturated. We are currently working to extend the BSL to quantum systems. 
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A Oscillator Evolution 


Here we show plots of the distribution functions for the three numerical harmonic- 
oscillator experiments discussed in Section 6.3. 





(a) Initial distribution 


(b) Final distribution (c) Cycled distribution 





(d) Updated initial 
distribution 


(e) Updated final 
distribution 


(f) Updated cycled 
distribution 


Figure 3: Evolution of a damped harmonic oscillator coupled to a heat bath in initial 
thermal equilibrium under a trivial protocol. Units are chosen such that M = 1, 
k(t = 0) = 1, and /3 = 1. Each graph shows the phase space probability distribution 
with respect to position and momentum at different points in the experiment. 
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(d) Updated initial (e) Updated final (f) Updated cycled 

distribution distribution distribution 

Figure 4; Evolution of a damped harmonic oscillator coupled to a heat bath with 
known position and magnitude of momentum under a trivial protocol. Units are chosen 
such that M = 1, k{t = 0) = 1, and (3 = 1. Each graph shows the phase space 
probability distribution with respect to position and momentum at different points in 
the experiment. 
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Figure 5: Evolution of a damped harmonic oscillator coupled to a heat bath in initial 
thermal equilibrium under a “dragging” protocol. Units are chosen such that M = 1, 
k(t = 0) = 1, and /3 = 1. Each graph shows the phase space probability distribution 
with respect to position and momentum at different points in the experiment. 
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